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Abstract—Combinatorial Interaction Testing has been applied 
to event-driven software systems by using as test suite a set of 
sequences of inputs in desired combinations. This is generally 
called combinatorial sequence testing (CST). CST requires possi- 
bly new system models from which tests are generated and new 
test generation methods (or an adaptation of the classical ones). 
Finite State Machines (FSMs) can easily represent event-based 
systems where certain inputs are valid only in some states and 
such constraints can be represented by the incompleteness of the 
FSM. In this paper, we propose an approach to CST where tests 
are generated from FSMs which are represented by automata 
together with test requirements. First, automata can be used to 
check if test sequences contain invalid inputs. We propose three 
methods to repair tests with invalid inputs. Moreover, we can 
directly embed into automata the system constraints over the 
inputs during generations, to generate only valid test sequences. 
We compare our automata-based method with the standard 
approach of Sequences Covering Arrays (SCAs) that produces 
a set of sequences, all with the same length, composed by the 
permutation of all the events supported by the system. We found 
that generating only valid tests from automata provides several 
advantages w.r.t. repairing tests and SCAs. 

Index Terms—Test Sequence Generation; Sequencing Con- 
straint; T-way Sequence Coverage; Sequence Testing; Event- 
based Testing; Combinatorial Testing; Constrained Combinato- 
rial Testing 


I. INTRODUCTION 


Combinatorial interaction testing (CIT) has been an active 
area of research for many years, since it has proven to be 
very effective to test complex systems with multiple input 
parameters. In [23] the authors count several research groups 
that actively work on CIT area and many other recent groups 
and tools are not considered in that paper, while in [16] a lot of 
algorithms and tools available for CIT are analyzed. Recently, 
the CIT approach has proven to be effective not only to test 
a system by varying the values of its input parameters, but 
also to test combinations of events in event-driven software. 
In this case, the extension of CIT to sequences of events 
is also referred in the literature as combinatorial sequence 
testing (CST) [10], [13], [18], [27]. CST can be successfully 
applied to test event-driven software or systems ( [3]-[5], 
[15], [21], [28], [35]). A common technique for CST consists 
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in exploiting sequence covering arrays (SCA) for testing. 
Given a set of events F, a SCA of strength t is a set of 
permutations of all the events in Æ such that each sequence 
of t distinct elements of E is a subsequence of at least one of 
the permutations. 

Among system inputs there are typically several constraints, 
and the failure revealing ability of CIT methods might be 
significantly reduced if the system has to comply with con- 
straints and the test suite generator does not take them into 
account. For this reason, several approaches are extended to 
Constrained Combinatorial Testing [17], but all of them are 
focused on input testing. For event-driven software there are 
constraints over the events that can be used as inputs during 
testing and, also in this case, considering the constraints over 
the events during generation can increase the efficiency of 
testing. For instance, a SUT may require that a given event 
read must appear after another event open and if a test does 
not meet this constraint, the test is invalid, and it cannot be 
applied. 

In this paper, we present a method that can be used to 
generate test sequences for the events of incomplete Finite 
State Machines (FSMs), by taking into account also their 
constraints. Our method exploits the representation of FSMs 
using the automata notation. We have decided to focus on 
incomplete FSMs because they represent the situation in which 
the constraints of events are stricter, since a particular event 
may not be defined in some state. Moreover, we supported 
FSMs in the form of Mealy machines (Fig. 1), that are a rather 
general implementation of FSMs. We introduced also three 
different approaches to repair invalid test sequences, generated 
without taking into account the constraints imposed by the 
FSM of the system. 

We have discovered that using our method to generate 
sequences, complying the constraints imposed by the FSM of 
the system, can lead to a greater number of valid sequences 
and to a better coverage of both of states, event tuples and 
transitions. 

The paper is structured as follows. In Sect. II we provide 
some necessary background about the combinatorial testing, 


a/lO 
OOO 
b/1 


Fig. 1. Example of Mealy machine. a/0 means that when the input is a, 
the FSM produces the output 0 and it moves to the target state shown by the 
arrow. 


the FSMs and the constraints that we have to satisfy in 
our System Under Test (SUT). Our automata-based sequence 
generation method is explained in Sect. II and its application 
is evaluated in Sect. IV. Sect. V reviews some works related to 
application of combinatorial sequence testing and constrained 
combinatorial testing, and Sect. VI concludes the paper. 


II. BACKGROUND 


In this paper we generate sequences of events to test event- 
driven software [30]. This kind of software can be well 
described by using Finite State Machines and, in particular, 
Mealy machines since they allow to manage not only the states 
and the input events, but also output events. 


Definition 1 (Mealy machine). A Mealy machine F is a 6- 
tuple (S, so, £, A, T, G) in which: 


e S is a finite set of states. 

e so E S is the initial state of the machine F. 

e > is a finite set that represents the input alphabet. 

e A is a finite set that represents the output alphabet. 

e T: Sx% —> Sis the transition function that maps pairs 
of a state and an input symbol to the corresponding next 
state. 

e G: Sx = Ais the output function that maps pairs of 
a state and an input symbol to the corresponding output 
symbol. 


Since we aim to deal with real systems, we have to define 
our combinatorial sequence generation method for incomplete 
FSMs. 


Definition 2 (Complete and incomplete FSM). Given a FSM 
F(S, so, ©, A, T, G) we say that F is a complete machine iff 
for all s € S and for all e € & the transition function T (s, e) 
and the output function G(s, e) are defined. Contrariwise, we 
say that F is a incomplete machine iff there exist a state s € S 
and an event e € & for which the transition function T (s, e) 
is not defined (neither is G(s, e)). 


Example 1. The FSM in Fig. 1 is an incomplete machine, 
because T is not defined for the input symbols b and c in the 
state sg, for the input symbol a in the state sı, and for the 
symbols a, b, and c in the state s2. 


In the following pages we will refer to input symbols as 
events to use the same nomenclature as the one used in event- 
driven software. 


A. Combinatorial sequence testing of FSMs 


While in classical combinatorial testing we are interested 
in covering the interaction among a fixed set of inputs [19], 
each with a given set of possible values, in combinatorial 
sequence testing (CST) [18] of FSMs we focus on covering 
the interaction of inputs taken from a unique set (the input 
alphabet) but provided to the machine in different orders. This 
requires the redefinition of test as a sequence of inputs of 
variable length (in other approaches also for CST tests are 
still organized in Sequence Covering Arrays and they have all 
the same length). In our approach, a test is a finite sequence 
of events (e1, €2,..., €n) all belonging to ©. 


Definition 3 (Combinatorial sequence coverage). We say 
that a test suite achieves the t-way combinatorial sequence 
coverage iff for any tuple of t inputs there exists a test sequence 
in which these t inputs occur in any possible order (allowing 
interleaving extra inputs among the elements of the tuple). 


With the standard pairwise CIT, a test suite covers for 
each pair of input parameters all the possible combinations 
of values. Pairwise CIT can be extended to t-wise CIT when 
tuples of length t are considered instead of simple pairs. In 
our case, we want to generate sequences of events covering 
each tuple of t events. 

Most of the event-driven software can be represented using 
an incomplete FSM, since in some states, some events cannot 
be fired. This representation implicitly defines some con- 
straints on the FSM, meaning that only some test sequences 
are valid, while others are not. 


Definition 4 (Valid test 
FSM F(S,So, X£, A,T,G) as per Definition 1, let 
ts = (e1,€2,...,€n) be a test sequence composed of a 
sequence of n events. Assume that ts’ is the list of the events 
in ts starting from e, to e; and s(tsŻ) is the state reached 
starting from the initial state sọ by applying all the events in 
ts’. We call ts a valid test sequence iff, for all e; € ts, e; can 
be fired starting from the state s(ts'—'), i.e., T(s(ts’—'), e;) 
and G(s(ts'~+), e;) are both defined. 


sequence). Given a 


Example 2. Let’s suppose to be in the initial state so of the 
example in Fig. 1 in which only the event a can be fired. A test 
sequence (b, b, a) is a invalid test sequence, because the event 
b is not defined in the initial state. Contrariwise, for the same 
example, the test sequence (a,b,a) is a valid test sequence. 


For this reason, for an incomplete FSM, we may be unable 
to cover all the tuples of events because some of them can be 
covered only by sequences that are invalid. 


Example 3. In the example of Fig. 1, the pair of c followed 
(also non immediately) by a cannot be covered by any valid 
test sequence. 


III. COMBINATORIAL SEQUENCES GENERATION 


Having introduced what we mean with combinatorial se- 
quence testing and coverage, we can now introduce our 


sequence generation method. One could extend classical com- 
binatorial testing algorithms in order to generate SCAs, and 
this has been done for example in [18]. However, in our 
approach we are not bound to have all the tests of the same 
length (usually the number of the events), so classical methods 
that build covering arrays may be not well suited, and we 
decided to devise an automata-based approach. 

First, we introduce an automaton representing a t-wise 
permutation of t events. 


Definition 5 (T-wise automaton). Given a permutation p of t 
events (€1, €2,..., €+) the automaton built as in Fig. 2 is called 
t-wise automaton. We call automaton (p) the function that 
builds the t-wise automaton that represents the tuple p. 


Fig. 2. 
(e1, €2,.- 


Example of automaton for the recognition of the sequence 
., et) 


A t-wise automaton A can be used to check if a sequence 
s covers the tuple it represents: if s is accepted by A, than 
the tuple is covered. 

If there are n events, there exist 
hence t-wise automata. 

Exploiting the operations among automata we can build a 
test suite for CST as shown in Alg. 1. 

The algorithm is a typical one-test-at-the-time test generator. 
At the beginning it builds an empty automaton A and then it 
tries to add many t-wise automata to it in a random order. 
We say that it collects multiple t-wise automata in a unique 
automaton. At the end any string that can be derived form 
A is a test that covers all the permutations from which the 
t-wise automata are built. In particular we use the function 
string (A) that returns the shortest string accepted by the 
automaton A. We allow the user to set a limit N of automata to 
be collected together: in this way the user can favor few long 
sequences (high N) or many short sequences (low N). The 
effects of the variation of the parameter N will be analysed 
in Sect. IV. 

This approach is similar to the one presented in [6] where a 
logical context and an SMT solver is used to collect tuples in 
order to generate tests for classical constrained combinatorial 
interaction testing. 

This standard algorithm, however, could generate invalid 
test sequences. So, we have devised three different approaches 
to repair invalid tests: 


n! 


m permutations and 


e Reject_not_valid (REJ): if a sequence contains an event 
that is invalid at the time in which it is applied, the whole 
sequence is rejected. 

e Stop_at_error (STP): if a sequence contains an event that 
is invalid at the time in which it is applied, the sequence 
is executed only until the error is reached. The following 
events are not tested. 


Algorithm 1 Algorithm for test generation 
Require: 


I the set of events 
Require: ¢ the strength of the tests 
Require: N the max number of tuples for each test sequence 
Ensure: TS the test suite for CST 
T + t-permutations of I 
TS+- Ó 
i0 
A + empty automaton 
while T 4 Ø do 
p + a random element in T 
a + automaton(p) 
if aN A # Ø then 
Atana 
T&T- {p} 
i i+l 
if i > N then 
ADDTEST(Z'S,A) 
i+} 0 
end if 
end if 
end while 
ADDTEST(T S,A) 


procedure ADDTEST(T S,A) 
TS + TS + string(A) 
A + empty automaton 
end procedure 


e Skip_error (SKP): if a sequence contains an event that 
is invalid at the time in which it is applied, the single 
event is skipped, and the following events are executed. 


A. Generation of only valid tests 


In order to avoid the generation of invalid tests, we modify 
the algorithm as presented in Alg. 2. In this new version of 
the generation algorithm, called CNST, we collect the t-wise 
automata not starting from an empty automaton but from the 
automaton that accepts only valid sequences of inputs for the 
FSM under test (lines 4 and 24). Moreover, the FSM may 
never accept a given permutation of t events, and in this case 
this tuple is said infeasible. 


Example 4. For the Mealy machine in Fig. 1, the tuple a—c—b 
is infeasible, because the input symbol b cannot be accepted 
after the first two symbols. 


Since there is no valid test that covers infeasible tuples, it is 
important to detect and discard them form the requirements. 
This is done in the algorithm at line 16 where it checks if a 
tuple p that cannot be collected with the current automaton, 
can instead be collected with the automaton containing only 
the constraints of the FSM (automaton(F)). If p cannot be 
collected even with the automaton(F’), then it means that the 
tuple is infeasible. 


Algorithm 2 Algorithm for test generation 


Algorithm 3 Monitoring 


Require: J the set of events 
Require: F the finite state machine 
Require: ¢ the strength of the tests 
Require: N the max number of tuples for each test sequence 
Ensure: TS the test suite for CST 
1: T + t-permutations of I 
-TS 
:i<0 
: A + automaton(F) 
: while T 4 Ø do 
p< a random element in T 
a + automaton(p) 
if aN A # then 
AaNA 
T&T- {p} 
i i+l 
if i > N then 
ADDTEST(T S,A) 
i0 
end if 
else if automaton(F) Na = @ then > p is infeasible 
T&T- {p} 
end if 
: end while 
: ADDTEST(T S, A) 


> init A with the FSM automaton 
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: procedure ADDTEST(7T'S,A) 
TS + TS + string(A) 
A + automaton(F) 

automaton 

25: end procedure 


N 
W 


> init A with the FSM 


N 
$ 


Example 5. Fig. 3 shows the collecting operation between an 
automaton representing the SUT and the pair 1 — 0. As the 
figure shows, the resulting automaton can contain much more 
states and transitions than the original one. 


B. Monitoring 


To further optimize the generation, we can perform moni- 
toring which consists in checking if a test generated for a set 
of tuples accidentally covers other tuples as well. Algorithm 
3 implements monitoring works: once a test is generated, 
all the tuples that are still not covered are checked against 
the test. If a tuple is covered, then its is discarded. To 
check if a tuple is covered by a test, we can check if the 
automaton representing that tuple accepts the test sequence. 
Note that while the collecting of Alg. 2 can be expensive, 
since it requires the operation of intersection among automata, 
monitoring is generally much faster since acceptance is easily 
computed. 


IV. METHOD EVALUATION 


We use the dk.brics.automaton [22] Java pack- 
age to build automata representing the FSM of the whole 


1: procedure ADDTEST(7'S,A,7) 
2: test + string(A) 


3 for all t € T do 

4 if 1s Accepted(test, automaton(t)) then 
5: TT- {t} 

6 end if 

7 end for 

8 TS + TS + test 

9: A + automaton(F) 


10: end procedure 


system and each tuple of events. The code we have 
used to execute the method evaluation can be found in 
the following public repository: https://github.com/fmselab/ 
FiniteStateMachineCombinatorial. 

To evaluate our automata-based generation method for se- 
quence combinatorial testing of Finite State Machines, we 
have tested and analysed the coverage of the pair-wise test 
sequences over four different systems described using FSMs 
(see Table I): the IEEE 11073 PHD’s communication model 
(already analysed and tested with different approaches in [4] 
and [33]), a pattern matching system (for the recognition of the 
regular expression 01[0*]1), a simple elevator and a vault that 
can be unlocked only by the combination ”12345”. As shown 
in Table I, we use different values of N among the models 
since the PHD communication model is more complex than 
the others, and the intersection operation times out with N 
greater than 10 for it. 

We have represented all the benchmark systems using the 
SMC (State Machine Compiler) standard language [26] that 
allows to express the behavior and generate the classes in 
a lot of different languages, by using the included compiler. 
Listing 1 shows an example of the SMC description of the 
vault benchmark, where charl...char5 are the events fired by 
the FSM when a number is pressed. 

The results of the evaluation of our methods, by executing 
the test generation process 10 times for every combination of 
options, are reported in Table II. 

In particular, we are interested in answering the following 
research questions: 


RQI How does the sequence generation time correlate 
with the size of the system, depending on the 
method? 

How the CNST method impacts the number of valid 
sequences and coverage w.r.t. the other methods 
(REJ, SKP, and STP)? 

How does the monitoring optimization influence the 
coverage of the sequences? 

How does the number of pairs covered by the se- 
quences correlates with the value chosen for the 
parameter N? 

How does the sequence generation time correlate 
with the value chosen for the parameter N? 


RQ2 


RQ3 


RQ4 


RQ5 
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(a) Automaton of the system (b) Automaton of the pair 1 — 0 (c) Intersection among the two previous automatons 


Fig. 3. Intersection process among automata for the pattern recognition system 


Listing 1. Example of SMC description of the vault benchmark 


%class Vault 
%package examples 


% start MainMap::Idle 
%map MainMap 


%% 
Idle 


charl First_number { no_response (); } 
First_number 

char2 Second_number { no_response (); } 
ee EA 


char3 Third_number { no_response(); } 


Third_number 


char4 Fourth_number { no_response (); } 
} 
Fourth_number 
char5 unlocked { no_response (); } 
} 
r% 
TABLE I 
BENCHMARKS 
PHD Pattern Elevator | Vault 
Communication | Recognition 

Model 
# Automata per test 10 20 20 20 
sequence (N) 
# Transitions 65 9 6 5 
# States 5 5 4 6 
# Events 23 2 8 5 
# Event pairs 529 4 64 25 
# Valid event pairs 484 4 25 10 
# Event triples 12,167 8 512 125 
# Valid event triples 10,648 8 125 10 


RQ6 How does the total length of sequences correlate with 


TABLE II 
METHOD EVALUATION (PAIRWISE TESTING) 


7 alg 

ae op z: EREJE- 

F z sl g|g| |2| 4 |#/4/ = 
E Z| 3 e$ äl āiā l3] gll] = 
E | = |2|äļ|ál| |719 12/2] & 
a Z| > TRIS 2]4| 8+ RIKI] © 
PHD NO | CnsT || 41 |20 | 11 | 17 | 708 | 41 | 484| 5 |51 | 428.40 
PHD NO | SKP || 45} 20] 2 |151693| 0 |270| 5 | 39] 135.60 
PHD NO | REJ || 45/19} 2 |15]701] 0} 0 | 0] O | 144.24 
PHD NO | STP || 45}20] 2 | 15|692] 0 | 49 | 2 | 12] 150.80 
PHD YES | Cnst || 41 | 21 | 15 | 17 | 723 | 41 | 484] 5 | 51 | 474.92 
PHD YES | SKP |) 45} 20} 2 | 15|686]| 0 | 271] 5 | 39 | 185.08 
PHD YES | REJ |) 45] 18] 2 |15}695] 1 1 | 1 | 2 | 131.00 
PHD YES | STP |) 45] 18} 2 |15|701| 0 | 55 | 3 | 16] 168.05 
Pattern rec. | NO | CNsST|]) 1 | 4] 4/4] 4 1/4 |5 {4 1] 0.07 
Pattern rec. | NO | SKP 1/4)4]4] 4 1/4 |5]4 1] 0.06 
Pattern rec. | NO | REJ 1/4)4])4] 4 1/4 |5]4 1] 0.06 
Pattern rec.| NO | STP 1/4)4])4] 4 1] 4 | 4 ]3 ]} 0.07 
Pattern rec. | YES | CNsT|| 1 | 4] 4]4] 4 1/4 |5]|4 | 0.07 
Pattern rec. | YES | SKP 1/4)4)4] 4 1/4 |5]4 1] 0.06 
Pattern rec. | YES | REJ 1/4)4]4] 4 1) 4 |5]4 1] 0.07 
Pattern rec. | YES | STP 1/4)4]4] 4 1] 4 | 4 ]3 ] 0.07 
Elevator NO | CnstT |] 1 | 12/12/12] 12 | 1 | 25 | 4/6] 0.55 
Elevator NO | SKP || 3 | 13} 11]12] 36] 0 | 14 | 4] 6 | 74.33 
Elevator NO | REJ 3 |}13}12]12] 37 ]0] 0 | 0] 0 | 65.50 
Elevator NO | STP 3 }13}11]12] 36); 0] 0 | 2] 1 | 79.08 
Elevator YES | CnsT || 1 | 12]12]12] 12] 1 | 25 |4]6] 0.58 
Elevator YES | SKP || 3 | 14} 11/12] 36/01 8 | 4] 5 | 62.25 
Elevator YES | REJ 3 }13}11]/12] 37] 0] 0 | 0] 0 | 99.37 
Elevator YES | STP 3 |12|10|11| 34)0] 3 | 4] 4 | 87.16 
Vault NO |CnstT|| 1]5/5]/]5]5 |1]|10|6/5]| 021 
Vault NO | SKP || 2}10}5]7]15]1}10|6/5 | 2.43 
Vault NO | REJ 2/10); 4]7/ 14/0) 0 )0}]0}] 2.10 
Vault NO | STP || 2}10}5]7)]15]0}] 6 | 5] 4] 1.87 
Vault YES | CNnsT|| 1/5 ]5]5]5 | 1]10]6]5 1] 0.21 
Vault YES| SKP |} 2/9}]51]7/] 14/0] 10]6}]5 1] 1.40 
Vault YES | REJ 2/9|5]7/ 14/0) 0/0/10 1.80 
Vault YES| STP || 2|10}4]7/] 14/0] 10]6}]5 1] 1.50 


the value chosen for the parameter N? 
RQ7 Is our method better than the standard sequence 
generation method based on SCAs? 


A. RQI: Sequence generation time and system size 
By observing the generation time! of the sequences in Table 
II, we can see that for small systems (such as the elevator) the 


‘Experiments have been run on a computer with 14GB of RAM and a 
Intel® Core™ 15-750 CPU 


TABLE III 
EVALUATION OF THE RESULTS OBTAINED WITH DIFFERENT GENERATION 
METHODS 


| % Valid Seq. | % Pairs Cov. | % States Cov. | % Transitions Cov. 


Cyst | 100.00 | 100.00 | 100.00 | 77.65 

Skp | 392 | 55.50 | 100.00 | 62.94 

Rey | 294 | 086 | 2750. | 5.88 

Srp | 196 | 1252 | 75.00 | 28.24 
TABLE IV 


COVERAGE WITH AND WITHOUT MONITORING DEPENDING ON THE 
GENERATION METHODS 


| Cnst | REJ | STP 
No monitoring | 92.55% | 73.50% | 10.16% | 33.27% 


| 92.55% | 72.79% | 12.67% | 43.90% 


SKP | 


Monitoring 


time required by CNST is much smaller than the time required 
by the others. The reason is that repairing the sequences 
significantly takes more time than the sole generation time. 
Contrariwise, for systems that have many events, CNST is the 
slowest since the generation of the sequences by complying the 
constraints of the FSM requires more time than the repairing 
of the sequences. In this case, building the intersection among 
automata is time consuming, since they must contain the 
system constraints from the beginning. However, CNST leads 
to better results in terms of coverage as discussed below. 


B. RQ2: Coverage and valid sequences with CNST 


As can be seen from the results in Table III, our method 
CNST, that generates test sequences following the constraints 
imposed by the FSM of the system, leads to better (or equal) 
results than the other approaches: 


e The percentage of valid sequences is higher. In many 
cases, other methods do not produce valid sequences. In 
those cases, we must repair the sequences (with one of 
the three proposed approaches) to still perform testing. 

e The overall coverage (event pairs, states and transitions) 
is higher or the same for CNST compared to the other 
methods, because we can execute all the sequences since 
they contain only valid events. 


Note that the pairs coverage is computed only over the 
number of feasible pairs because some of them cannot be 
covered due to the constraints imposed by the system. 


C. RQ3: Monitoring 


By comparing the results obtained without using the moni- 
toring optimization and the ones obtained with the monitoring 
optimization (Table IV) we can see that the methods that 
involve the repairment of the sequences generally have a 
better or equal coverage when the monitoring is executed. 
This is reasonable because without monitoring we have more 
sequences that can fail and, in some cases, when the test 
sequence is invalid we have to stop its execution before its 


TABLE V 
COVERAGE WITH AND WITHOUT MONITORING - AVERAGE AMONG 
BENCHMARKS AND GENERATION METHODS 


| % Pairs Cov. | % States Cov. | % Transitions Cov. 
No monitoring | 42.25 | 72.5 | 42.35 
| 4269 | 78.75. | 45.00 


Monitoring 
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Fig. 4. Number of pairs covered with different values for the parameter N 
when the SKP method is used 


termination. Also considering all the methods together (Table 
V), the monitoring optimization always produces better results. 

Moreover, in these experiments, the monitoring optimization 
has shown to be not time consuming, so we expect that it is 
a good choice to apply it. 


D. RQ4: Correlation between the number of covered pairs 
and N 


When the sequence repairing process is used, the number of 
the pairs covered is influenced also by the value chosen for the 
parameter N (number of automata per batch). Figure 4 (in the 
case of the PHD communication model) shows that for the 
SKP repairment method, the number of covered pairs has a 
growing trend with increasing N. This happens because if we 
have long sequences, we can include into them more pairs and, 
since we skip the events that are invalid, we can cover more 
pairs. On the other hand, Figure 5 (in the case of the PHD 
communication model) shows that for the STP repairment 
method, the number of covered pairs has a decreasing trend 
with the growth of N because having long sequences means 
that, when an invalid event is reached, we stop the execution 
of the whole sequence, so we do not execute a lot of events. A 
similar behavior can be observed by using REJ. Contrariwise, 
if the CNST method is used, the number of covered pairs 
remains constant when NV varies, because all the pairs that 
are added in a test sequence satisfy all the constraints. This 
means that, for SKP method, a big N can improve the coverage 
while for STP and REJ it is better to have many short tests. 


E. RQS: Correlation between generation time and N 


The tester can arbitrarily choose the value of N depending 
on the generation and repairment method chosen but neverthe- 
less, it is important to consider that the sequence generation 
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Fig. 5. Number of pairs covered with different values for the parameter N 
when the STP method is used 
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Fig. 6. Sequence generation time [s] with different values for the parameter 
N when the STP and pairwise testing are used 


time increases exponentially with the increment of N, because 
the intersection of N automata usually requires much more 
time than the intersection of N — 1 automata, especially 
for rather high value of N. Fig. 6 shows the correlation 
between generation time and N when the constraints are 
not considered in the generation phase (in the example the 
STP method is used on the PHD benchmark). Even when 
considering the constraints during the generation phase (CNST 
method on the PHD benchmark) the correlation between N 
and the generation time is exponential (see Fig. 7). However, 
increasing N leads to smaller test suites, as shown in the 
following RQ. 


F RQ6: Correlation between the length of the sequences and 
N 


Increasing the value of the number of automata per test (N) 
obviously leads to a decrease of the number of sequences. 
Moreover, increasing N the total number of events in the test 
suite decreases too. Fig. 8 reports that the sum of the lengths 
of each single sequence decreases when JN increases. This is 
reasonable because the more is the length of the sequence, 
more possible is that we can avoid repeating the first event 
of the pair we want to test. Even if Fig. 8 shows the plot 
for the CNST method, we have verified that the same trend is 
respected also by methods that repair the sequences. 


1 2 3 4 5 6 7 8 9 10 
Automatons per test sequence (N) 


Fig. 7. Sequence generation time [s] with different values for the parameter 
N when the CNST and pairwise testing are used 
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Fig. 8. Total sequences length with different values for the parameter N 


when the CNST method and pairwise testing are used 


G. RQ7: Comparison between automata-based generation 
method and SCAs 


To compare the results obtained by our automata-based 
method with the ones coming from the use of SCAs we 
have computed also the coverage information for the 3-wise 
testing (Table VI)*. We have generated the SCAs for each our 
benchmark by using the tool provided by [24]. 

SCAs are based only on permutations of n events, thus 
they don’t consider the constraints of the system. For this 
reason, we need to repair even the sequences produced by 
the SCAs generator. As the standard approach with SCAs 
produces sequences all with the same length (equal to the 
number of the events considered), the total length of the test 
sequences is shorter than those obtained with our automata- 
based method. Also the sequences generation time is shorter 
because compute all the permutations is less complex than 
compute the intersection among automata. 

Table VII shows the summary of the comparison between 
the coverage obtained by our method and the one by SCAs 
(with different repairing procedures). The results confirm that 
our automata-based method performs better than SCAs in 
every analyzed aspects. Even comparing the coverage of the 


For the PHD benchmark we had to reduce the value of N due to the high 
complexity of the system: with N > 6 3-wise automata the generation times 
out. 


TABLE VI 
METHOD EVALUATION (3-WISE TESTING) 


* y 
z 5 S 
p v 

£ s | 3 ini 
A 5 E hA š] £ 
a = = z, $ 2| 2 
PHD NO CNST 6 1,775 | 22 | 10 
PHD NO SKP 6 1,521 | 22 | 12 
PHD NO REJ 6 1,521 | 21 12 
PHD NO STP 6 1,521 | 22 | 12 
PHD YES | CNST 6 1,775 | 23 | 11 
PHD YES SKP 6 15521 I-21 12 
PHD YES REJ 6 1,521 | 21 12 
PHD YES STP 6 1,521 | 21 11 
Pattern rec. NO CNST 0 1 6 6 
Pattern rec. NO SKP 0 1 6 6 
Pattern rec. NO REJ 0 1 6 6 
Pattern rec. NO STP 0 1 6 6 
Pattern rec. | YES | CNST 0 1 6 6 
Pattern rec. | YES SKP 0 1 6 6 
Pattern rec. | YES REJ 0 1 6 6 
Pattern rec. | YES STP 0 1 6 6 
Elevator NO CNST 0 12 15 10 
Elevator NO SKP 0 43 16 11 
Elevator NO REJ 0 43 16 11 
Elevator NO STP 0 43 15 11 
Elevator YES | CNST 0 13 14 | 10 
Elevator YES SKP 0 43 16 | 11 
Elevator YES REJ 0 43 15 | 10 
Elevator YES STP 0 43 15 | 11 
Vault NO CNST 0 1 5 5 
Vault NO SKP 0 11 12 5 
Vault NO REJ 0 11 12 7 
Vault NO STP 0 11 11 6 
Vault YES | CNST 0 1 5 5 
Vault YES SKP 0 11 11 8 
Vault YES REJ 0 11 12 8 
Vault YES STP 0 11 12 9 


SCAs with the one got by repairing the sequences obtained 
by using automata-based approach, the results are the same 
(51.10% of overall coverage for automata-based method vs 
29.03% for SCA standard approach) excepts for the percentage 
of valid sequences, since SCAs generate fewer test sequences. 
The results tell that the automata-based method is overall 
better than SCAs method, and it is confirmed by the paired 
t-test [32] with: 
e Ho: the two methods perform in the same way, in terms 
of coverage 
e 8 degrees of freedom 


e t = 3.8507 
e Pvalue = 0.004873 
e a = 0.05 


We can see that Pualue < @, so we can reject the hypothesis 
Ag and claim that our method performs better. 


V. RELATED WORK 


Combinatorial interaction testing (CIT) has been shown to 
be an effective approach to manage the complexity of the 
test of event-based software. In the last years, because of the 


g 3 n 5 a7 5 E 
oy D © . . . it 
SJ 33| ee Ie gee 
sb a > [S] O15 g 
Žž £ $ $ + # Oo 
15 | 28,076 | 1,775 | 10,648 5 65 538.37 
16 | 25,198 0 7,075 5 65 981.22 
16 | 25,178 0 0 0 0 827.73 
16 | 25,277 0 1,298 5 42 881.79 
15 | 28,032 | 1,775 | 10,648 5 65 | 7475.46 
16 | 25,159 0 7170 5 65 | 7754.58 
16 | 25,165 0 0 0 0 7329.89 
16 | 25,194 0 1,263 5 39 | 7109.64 
6 6 1 8 > 5 0.09 
6 6 1 8 5 5 0.10 
6 6 ji 8 5 5 0.08 
6 6 1 8 a) 5 0.09 
6 6 1 8 5 5 0.10 
6 6 1 8 5 5 0.09 
6 6 1 8 5 5 0.08 
6 6 1 8 5 5 0.09 
12 147 12 125 4 6 3.39 
12 554 0 65 4 6 162.74 
13 561 0 0 0 0 146.56 
13 565 0 0 3 2 154.50 
12 158 13 125 4 6 4.67 
13 565 0 65 4 6 161.99 
13 561 0 0 0 0 190.43 
13 561 0 1 4 5 157.33 
5 5 1 10 6 5 0.97 
9 108 0 10 6 5 5.47 
9 107 0 0 0 0 4.13 
9 109 0 10 6 5 4.28 
5 5 1 10 6 5 1.26 
9 107 0 10 6 5 4.08 
10 111 0 0 0 0 4.58 
10 112 0 1 4 3 4.26 
TABLE VII 
COMPARISON BETWEEN SCAS AND AUTOMATA-BASED METHOD (3-WISE 
TESTING) 
z z z 
Fi si) 3 3 z 
z 3 3 A X 
= & 3 Ss oo 
3 E E £ g 
Automata-based | CNST | 100.00% | 100.00% | 100.00% | 95.29% | 98.43% 
Automata-based | SKP 0.06% 66.77% 100.00% | 95.29% 
REJ 0.06% 0.07% 25.00% 5.88% 51.10% 
STP 0.06% 12.00% 92.50% 62.35% 
SCAs SKP 2.17% 26.44% 85.00% 57.65% 
REJ 2.17% 0.00% 15.00% 2.35% 29.03% 
STP 2.17% 0.15% 50.00% 24.17% 


growth of the number of software based on the interaction 
with the user, the combinatorial sequence testing (CST) has 
been used in many fields. 

The classical approach for CST requires the use of Sequence 
Covering Arrays (SCAs) [8], [29], [36], that provide a set of 
permutations of the events supported by the system. Many 
techniques have been proposed to generate these kind of data. 

In [1], the authors apply combinatorial-based event se- 


quence methods to test Android applications, aiming to min- 
imize the execution of events and maximize the coverage 
of event combinations. However, they only use a greedy 
algorithm and they don’t consider the constraints about the 
order of events imposed by the SUT. CST has been also used 
for browser fingerprinting in [14], where the authors show 
that combinatorial properties have an impact on browsers’ 
behavior during the TLS handshake with a server, and in [31] 
where an interaction-based test sequences generation method 
for testing Web Apps is proposed. The same TLS protocol has 
been tested also in [13], where weighted t-way sequences are 
used to derive sequence test cases for its testing. The methods 
described in [18] can be used for testing mission critical 
systems that accept multiple inputs and generate outputs to 
several communication links, where it is important to test the 
order in which events occur. 


A different technique is used in [25], where the authors 
present a feasible test suite generation technique using a meta 
heuristic search called Simulated Annealing (SA) for T-way 
EDISTC-SA generator. 


The main problem of the application of combinatorial 
testing in actual event-based systems is that, in many cases, 
some event can be fired only when a certain event has already 
been fired. This is why it is essential to consider also the 
constraints of the system while generating test sequences. The 
authors of [2] describe an approach to test suite generation for 
Constrained Combinatorial Testing (CCT) based on Answer 
Set Programming. In [7], the authors propose a solution for 
CCT in which the constraints on the input parameter values 
are expressed as logical predicates that can be solved by a 
formal logic tool. 


Another approach that aims to deal with constraints in CIT 
is the algorithm IPOG-C [34] which includes optimizations to 
improve the performance of constraints handling. In [20], two 
novel algorithms to deal with constraints in CIT are presented: 
CCS (Construct Constraint Set) and CTWC (Combinatorial 
Testing With Constraint). The former computes implied con- 
straints, while the latter uses the results of CCS to facilitate 
the test generation process. 


Latest software systems permit a high configurability, in 
terms of parameter. In this research field, CIT has become 
widely used. For example, the authors of [11] describe how 
CIT can be extended, by adding some new testing policies able 
to check if the model correctly identifies constraints among the 
various software parameters. 


In the methods described into this paper we have used FSMs 
to represent the constraints of the system. Other notations have 
been proposed in other papers. For example, in [9], the authors 
develop a notation for specifying sequencing constraints and 
present a t-way test sequence generation method that handles 
the constraints specified in this notation. The authors of 
[12] discuss automatic test sequence generation and coverage 
criteria for testing abstract state machine (ASMs). 


VI. CONCLUSIONS 


Testing event-based systems can be very challenging be- 
cause most of them are described using incomplete Finite State 
Machines. For this reason, it is possible that a specific input 
cannot be applied when the system is in a particular state, or 
also that the response of the system for it is not defined. 

Classical approaches used to test event-based systems use 
Sequence Covering Arrays (SCA) but they do not consider 
the constraints imposed by the system, so some of the test 
sequences can be useless or in need of repair. 

In this paper, we have proposed a novel solution for test- 
sequences generation that exploits the FSMs and their repre- 
sentation through automata. The approach consists in using 
the SUT FSM as a description of constrains, turned into an 
automaton, and representing each tuple of events to be tested 
with a t-wise automaton. The intersection between the two 
kinds of automata, if it exists, produces another automaton 
that comply with the constraints of the system and covers 
the considered tuple. We have also devised three repairing 
methods that allow the execution of invalid test sequences, 
by rejecting the whole sequence, skipping the wrong event or 
stopping at the first wrong event. 

Our method has shown better performance (in terms of 
coverage and valid sequences) than the standard SCAs ap- 
proach, even when the repairment of the test sequences is 
applied. Moreover, with the automata-based method we can 
also generate test sequences with multiple repetitions of the 
same event (for example by testing the pair e; —e;), while with 
SCA, a single occurrence for each event is allowed. This is an 
important aspect because some systems can show malfunctions 
only when an event is repeated multiple times. Using automata 
to generate tests can be very time consuming and, for this 
reason, we have introduced a limit on the number of t-wise 
automata that can be collected together. 

As future work, we will try to apply this method in systems 
that do not have a well know FSM structure, by applying a 
preprocessing procedure to automatically learn the behavior of 
the SUT and representing it using the FSM formalism or an 
automata-based representation. 


REFERENCES 


[1] D. Adamo, D. Nurmuradov, S. Piparia, and R. Bryce. Combinatorial- 
based event sequence testing of android applications. Information and 
Software Technology, 99:98-117, jul 2018. 

[2] M. Banbara, K. Inoue, H. Kaneyuki, T. Okimoto, T. Schaub, T. Soh, 
and N. Tamura. catnap: Generating test suites of constrained combi- 
natorial testing with answer set programming. In Logic Programming 
and Nonmonotonic Reasoning, pages 265-278. Springer International 
Publishing, 2017. 

[3] G. Becci, G. Dhadyalla, A. Mouzakitis, J. Marco, and A. D. Moore. 
Robustness testing of real-time automotive systems using sequence cov- 
ering arrays. SAE International Journal of Passenger Cars - Electronic 
and Electrical Systems, 6(1):287-293, apr 2013. 

[4] A. Bombarda, S. Bonfanti, A. Gargantini, M. Radavelli, F. Duan, 
and Y. Lei. Combining model refinement and test generation for 
conformance testing of the IEEE PHD protocol using abstract state 
machines. In Testing Software and Systems, pages 67-85. Springer 
International Publishing, 2019. 

[5] R. C. Bryce, S. Sampath, and A. M. Memon. Developing a single 
model and test prioritization strategies for event-driven software. IEEE 
Transactions on Software Engineering, 37(1):48-64, jan 2011. 


[6 


[7 


[8 


[9 


[10 


[11 


[12 


[13 


[14 


[15 


[16 


[17 


[18 


[19 


[20 


[21 


[22 


[23 


[24 


[25 


[26 


[27 


A. Calvagna and A. Gargantini. A logic-based approach to combinatorial 
testing with constraints. In B. Beckert and R. Hähnle, editors, Tests and 
Proofs, pages 66-83. Springer Berlin Heidelberg, 2008. 

A. Calvagna and A. Gargantini. A formal logic approach to constrained 
combinatorial testing. Journal of Automated Reasoning, 45(4):33 1-358, 
apr 2010. 

Y. M. Chee, C. J. Colbourn, D. Horsley, and J. Zhou. Sequence covering 
arrays. SIAM Journal on Discrete Mathematics, 27(4):1844—-1861, jan 
2013. 

F. Duan, Y. Lei, R. N. Kacker, and D. R. Kuhn. An approach to t-way 
test sequence generation with constraints. In 20/9 IEEE International 
Conference on Software Testing, Verification and Validation Workshops 
(ICSTW). IEEE, apr 2019. 

J. Ferrer, P. M. Kruse, F. Chicano, and E. Alba. Search based algorithms 
for test sequence generation in functional testing. Information and 
Software Technology, 58:419-432, feb 2015. 

A. Gargantini, J. Petke, M. Radavelli, and P. Vavassori. Validation of 
constraints among configuration parameters using search-based combi- 
natorial interaction testing. In Search Based Software Engineering, pages 
49-63. Springer International Publishing, 2016. 

A. Gargantini and E. Riccobene. Asm-based testing: Coverage criteria 
and automatic test sequence generation. Journal of Universal Computer 
Science, 7, 02 2003. 

B. Garn, D. E. Simos, F. Duan, Y. Lei, J. Bozic, and F. Wotawa. 
Weighted combinatorial sequence testing for the TLS protocol. In 2019 
IEEE International Conference on Software Testing, Verification and 
Validation Workshops (ICSTW). IEEE, apr 2019. 

B. Garn, D. E. Simos, S. Zauner, R. Kuhn, and R. Kacker. Browser 
fingerprinting using combinatorial sequence testing. In Proceedings of 
the 6th Annual Symposium on Hot Topics in the Science of Security - 
HotSoS '19. ACM Press, 2019. 

C. S. Jensen, M. R. Prasad, and A. Møller. Automated testing with 
targeted event sequence generation. In Proceedings of the 2013 Inter- 
national Symposium on Software Testing and Analysis - ISSTA 2013. 
ACM Press, 2013. 

S. K. Khalsa and Y. Labiche. An orchestrated survey of available 
algorithms and tools for combinatorial testing. In 20/4 IEEE 25th 
International Symposium on Software Reliability Engineering. TIEFE, 
nov 2014. 

T. Kitamura, A. Yamada, G. Hatayama, C. Artho, E.-H. Choi, N. T. B. 
Do, Y. Oiwa, and S. Sakuragi. Combinatorial testing for tree-structured 
test models with constraints. In 20/5 IEEE International Conference on 
Software Quality, Reliability and Security. IEEE, aug 2015. 

D. R. Kuhn, J. M. Higdon, J. F. Lawrence, R. N. Kacker, and Y. Lei. 
Combinatorial methods for event sequence testing. In 20/2 IEEE 
Fifth International Conference on Software Testing, Verification and 
Validation. TEEE, apr 2012. 

Y. Lei, R. Kacker, D. R. Kuhn, V. Okun, and J. Lawrence. IPOG: A 
general strategy for t-way software testing. In 74th Annual IEEE Inter- 
national Conference and Workshops on the Engineering of Computer- 
Based Systems (ECBS'07). TEEE, mar 2007. 

L. Li, Y. Cui, and Y. Yang. Combinatorial test cases with constraints 
in software systems. In Proceedings of the 2012 IEEE 16th Interna- 
tional Conference on Computer Supported Cooperative Work in Design 
(CSCWD). IEEE, may 2012. 

N. Mirzaei, J. Garcia, H. Bagheri, A. Sadeghi, and S. Malek. Reducing 
combinatorics in GUI testing of android applications. In Proceedings of 
the 38th International Conference on Software Engineering - ICSE '16. 
ACM Press, 2016. 

A. Møller. dk.brics.automaton — finite-state automata and regular ex- 
pressions for Java, 2017. http: //www.brics.dk/automaton/. 


C. Nie and H. Leung. A survey of combinatorial testing. ACM 
Computing Surveys, 43(2):1-29, jan 2011. 
NIST. NIST sequence covering array generator. https: 


//csrc.nist.gov/Projects/automated-combinatorial-testing-for-software/ 
event-sequence-testing/unders, 2016. 

M. M. Rahman, R. R. Othman, R. Ahmad, and M. Rahman. A 
meta heuristic search based t-way event driven input sequence test 
case generator. International Journal of Simulation Systems, Science 
& Technology (IJSSST), 15:70-77, 01 2015. 

C. W. Rapp. SMC the state machine compiler, 
http://smc.sourceforge.net/. 

Y. Sheng, C. Sun, S. Jiang, and C. Wei. Extended covering arrays for 
sequence coverage. Symmetry, 10(5):146, may 2018. 


2019. 


10 


[28] 


34 


36 


D. E. Simos, L. Kampel, and M. Ozcan. Combinatorial methods 
for testing communication protocols in smart cities. In R. Battiti, 
M. Brunato, I. Kotsireas, and P. M. Pardalos, editors, Learning and 
Intelligent Optimization, pages 437—440, Cham, 2019. Springer Interna- 
tional Publishing. 

G. Tzanakis. Covering arrays from maximal sequences over finite fields. 
F. Wagner, R. Schmuki, T. Wagner, and P. Wolstenholme. Modeling 
software with finite state machines: a practical approach. Auerbach 
Publications, 2006. 

W. Wang, S. Sampath, Y. Lei, and R. Kacker. An interaction-based 
test sequence generation approach for testing web applications. In 2008 
llth IEEE High Assurance Systems Engineering Symposium. IEEE, dec 
2008. 

C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and 
A. Wesslén. Experimentation in Software Engineering. Springer Berlin 
Heidelberg, 2012. 

L. Yu, Y. Lei, R. N. Kacker, D. R. Kuhn, R. D. Sriram, and K. Brady. 
A general conformance testing framework for IEEE 11073 PHD's com- 
munication model. In Proceedings of the 6th International Conference 
on PErvasive Technologies Related to Assistive Environments - PETRA 
'13. ACM Press, 2013. 

L. Yu, Y. Lei, M. Nourozborazjany, R. N. Kacker, and D. R. Kuhn. 
An efficient algorithm for constraint handling in combinatorial test 
generation. In 2013 IEEE Sixth International Conference on Software 
Testing, Verification and Validation. IEEE, mar 2013. 

X. Yuan, M. Cohen, and A. M. Memon. Covering array sampling of 
input event sequences for automated gui testing. In Proceedings of 
the twenty-second IEEE/ACM international conference on Automated 
software engineering - ASE '07. ACM Press, 2007. 

R. Yuster. Perfect sequence covering arrays. Designs, Codes and 
Cryptography, nov 2019. 


